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ABSTRACT 

The estimation of values of a latent-trait presumed 
to underlie a given set of item response data is made on the basis of 
dichotomously scored items utilizing the so-called "normal ogive 
model ,, of Lawley and Lord. This model provides an internal scale of 
measurement, scores which are independent of the particular test, 
items employed, individual estimates of the standard error of each 
subject’s score, and a statistical test of how well the data conforms 
to the constraints of the model. Tables and graphs present the study 
results. {Author/DB) 
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I. Introduction 
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Physical and physiological measurements are not generally subject to the 

limitations inherent in psychological testing, where an unknown range of 

individual variation is compressed into a relatively restricted distribution 

of scores from a typically 10- to 40- item test. Sucb psychometric variables 

produce raw scores distributions which tend to be skewed and platykurtic, 

their particular properties being dependent upon the difficulty and discrim- 

inating power of the test items employed (Lord & Novick, 1968, pp. 386-392). 

To make valid inferences about the nature of these quantitative traits, especially 

by means of distributional analyses, it is apparent that we need mental 

variables possessing better metric properties than is usually the case. A 

theoretical solution for the hypothetical value of a trait or ability presumed 

■* 

to underlie a given set of item-response data is provided by the latent- trait 
psychometric models (Lord & Novick, 1968, Chs. 16-20; Basch, 1960) In the present 
study we consider the estimation of these trait values on the basis of n 
dici.otomously-scored items utilizing the so-called ’’normal ogive model" 0 f 
Lawley and Lord (Lawley, 1943; Lord, 1952; Bock & Liebeman, 1970). This model 
provides an internal scale of measurement, scores which are independent of the 
particular test items employed, individual estimates of the standard error of 
each subject’s score, and a statistical test of how well the data conforms 
to the constraints of the model. 
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2. The Normal Ogive Model 
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Consider an unobservable, continuous variable, Q, the "latent ability 
of the subjects, which is distributed normally in the population of reference 
with a mean 0.0 and variance 1.0. Letting .y-i indicate a correct response 
by subject i to a dichotomously scored item j, and ty" 0 °t... rwise , defln 



p ij ” Prob { r ij 

= $ (cj + 0^) 



* 1^ 



(1) 



where $ is the cumulative normal distribution function, 

Cj is an index of the difficulty of item j 

j 

and a is an index of the discriminating power of item j. 
j 

Then if v ± - [r i . 1 , the n x 1 score vector for a given subject, with 
ability © 



p ( Vi ) - it n Py rlj Q 

J»1 



± (1 - r i j) , where = 1 - ? ± y < 2 > 



on the assumption of ’’local independence ie that the probabilities 
of a correct response to any two items for a given value of 0 are 
statistically Independent of each other. (They are necessarily independent of 

0 since © does not vary.) 

A discussion of the plausibility of the normal ogive response character- 
istic can be found in Lord and Novick (1968, Ch. 16). However, the adequacy 
of the model must be verified for a given sample of test data. A common 
situation in which one would not expect a good fit is that in which subjects 
guess at unknown answers, thus raising the lower msymptotic value of Py 
considerably above zero. Equation (1) is easily generalized to include 

this possibility: 

Pij - gj [1 - * (Cj + a j»i>] + * <c j + a j e i ) (3) 

” g, + (1 - 84) t (°4 + a j0i^ 

where g, is a constant specifying the probability of a chance correct 

response to item j when tig answer is unknown. 



X 
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la general, the model requires that 

(a) the test in question is measuring substantially one trait (ie. a unifactor 
test) 

(b) the probability of answering a given item correctly increases monoton- 
ically with the subject’s level on the trait, and 

(c) the principle of local independence given above. 



3. Maximum likelihood Estimation of Latent Ability and Item Parameters 

The estimation of the parameters of the model may be approached from an 
unconditional or conditional point of view, depending upon whether the subjects 
are regarded as having been sampled from a specified population or are 
treated as given entities (see Bock, 1972). The former approach has proven to 
be extremely time consuming for tests of more than, say, ten items (Bock & 
Ueberman, 1970), The latter leads to simultaneous estimation of both subject 
and item parameters and has been adopted here because of its computational 

efficiency. 

A. Estimating ability when the item parameters are known 

Letting the parameters of the model be defined as in section 2, and P t j 
be defined by equation (3), P^) in equation (2) is the likelihood function of 
6 for a given subject. Omitting the 1 and j subscripts for convenience, 

l » log P(v) ® lr log P + Z(l-r) log Q ^ 

Letting »c.j + a^e and h(Y ) * the unit normal ordinate. 

. £( r _ i=X) ag - sS§0 a (1-8) h 00 " °‘ 

30 p Q d ^ 
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Aiso, = lillz&MQ. [ (r-p.) <-y - + toM) )-a-t)t(y)) 

30 PQ ^ 

since ||| - a 2 (l-g)h' (Y) and - -*• 

Applying the Newton- Raphson method to any k-th stage estimate of 0, 

0k« « 0k - (&> /<!&>. 

In the absence of guessing, of course, all computations are performed 
with the gj set equal to zero. 

B. Estimating item parameters when ability is known. 

Given the values of v* , subjectis of similar ability can be grouped to 
provide an empirical estimate of the proportion of correct responses to 
each item, at intervals along the ability continuum. Item parameters can 
then be estimated by means of probit analysis. (Finney, 1971; 

Bock and Jones, 1968). This solution is presented in detail in Kolakowski 
& Bock (1970) 

C, Estimating ability and item parameters simultaneously. 

The above solutions for each set of parameters are developed in terms of 
the other set. A computer program has been developed to estimate each set in 
turn, iterating until convergence is reached. (Kolakowski & Bock ,1970), Four 
to six estimation cycles usually produce stable values. Because the origin 
and unit of measure are abitrary, the subject parameters are standardized 
to zero mean and unit variance and all items are calibrated relative to the 
metric. 
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The gj are presently treated as constants which must be determined by 
inspection. Subjects for whom the procedure will not converge are assigned 

*■ 

a default value and, in the present investigation, are eliminated from subsequent 
analysis. The number of groups or fractiles used in partitioning the subjects 
for the probit analysis is arbitrary. 



4. The Problem of Bias in the M. L. Estimate of Ability 
A. Generation of synthetic item responses. 

Recall that 

* Prob “ 8 j + ^ “ Sj)© ^ c j + a j Q i^ 

Assuming constant values for the four parameters of the model, synthetic, 
response data can be generated by sampling a number n^j between 0.0 and 1.0 
from the rectangular distribution and assigning the values 

r tj = 1 for n ij - p ij 

« 0 otherwise 

This algorithm was performed using 38 previously calibrated test items, for 
values of gj * 0.0 and 0.15, and a sample of 750 random normal deviates 0^, 
hereafter referred to as "true scores." Estimates of these true scores, 0 ; 
and of the original item parameters , aj and , were then recovered from both 
sets of response data using 20 fractiles and an empirical prior. Execution 
time for runs of six complete estimation cycles on an IBM 360/65 computer 
was under 4.5 minutes. 

B. Comparison of distributional forms without guessing. 

For maximum sensitivity to the distributional forms, five tests of normality 
were employed: the coefficients of skewness and kurtosis, the U-statistic 
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= ratio of sample range to std. dev, (David et al , 1954) , Geary’s A * 
ratio of mean deviation to std, dev. (Geary, 1947), and a Chi Square, 
test on 18 degrees of freedom. Table 1 presents these indices for the distribu- 
tion of true scores and that of the resultant raw scores, thus illustrating 
the unacceptable properties of the latter. 

Table 2 (a) presents our results for the recovered estimates assuming the 
gj = 0.0, ie. no chance responses. Although all of the ability distributions 
have a mean of zero and variance of one by construction, the form of the 
distribution of the 0^ is leptokurtic and skewed to the right (Fig. 1), indicating 
that subjects of high ability receive inflated trait estimates. This is 
explained by referring to the graph of original vs. recovered item parameters 
(Figure 2), in «foich it is apparent that the easiest and most discriminating 
items are estimated as being even more extreme, thus defining a lower bound 

for ability, but having little weight in most calculations. On the other hand 

* A 

there is very little bias in the aj and Cj of easy items. The net result is 

a relative contraction of the left tail of the distribution. 

A systematic correction for such asymmetrical bias is difficult to conceive. 

However, the loss of a small number of unrealistically extreme subjects in the 
context of a distributional analysis can be tolerated. Therefore, since 
there were no true scores beyond the range of approximately - 3 standard 
deviations, we accordingly removed the five subjedts whose trait estimates had 
an absolute value greater than 3.0, Tabel 2(b) shows that the distribution 
of remaining subjects does not significantly differ from normality on any 
of the five indices. 

Similar analyses were performed for subtests of 10 and 20 items, 
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selected to uniformly span the entire range of difficulty. The program failed 
to converge to stable parameter estimates for a 10— item test. Apparently, 
this is too few items to adequately describe an underlying normal distribution, 
even with such a large number of subjects, and thus confirms the futility of 
unconditional estimation with only a handful of items (see Bock, 1972). 

The results for a 20-item test were similar to those for the 38 items 
(Table 2(c)), with a stronger upward bias than was the case for the longer 
test. Hence, the possibility exists that the use of large item pools could 
itself improve the validity of ability estimates. 

Given our priviledged knowledge of the true score distribution, the 
original analysis was performed again assuming a normal prior rather than 
the usual empirical prior. It can be seen in Table 2(d) that the results for 
the two approaches are virtually identical. This is not surprising because, 
whereas the normal prior fits the data more precisely, the extreme cases 
(in both tails) are given considerably more weight than the moderate subjects. 

Lastly, an analysis was performed assuming, contrary to fact, that the 
'’subjects’’ might have been guessing. Here the procedure failed to converge 
for each of three reasonable sets of guessing constants, each subjectively 
determined from an examination of the item response proportions in the 20 
fractiles. This tends to indicate that any results obtained under the guessing 
model when this assumption is unwarranted will undoubtedly be invalid. 



C. Comparison of distributions for data containing chance responses 8 

The results of the analysis of the synthetic guessing data are presented 
in Table 3 and Figure 3 for both the guessing and conventional options of the 
computer program. Whereas removing the extreme ^ from these distributions 
eliminated the original leptokurtosis , they remain significantly skewed to 
the right although not nearly as extreme. Comparing the two response models in 
Table 3 reveals that the guessing analysis is decidedly less skewed, and 
therefore more valid, when chance responses are in fact present in the data. 
However, the sensitivity of these tests will be better appreciated by referring 
to Figure 3 for a subjective evaluation of the differences between models. 
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In conclusion, the normal ogive guessing model should be employed 
when chance responses are likely to be present in the data, but failure of 
the conventional model to converge for guessing data —and not vice versa— 
indicates that the procedure will have the most validity in applications 
where guessing can be ruled out. In any case, if the present methodology 
is found to be valid for a variety of prior distributions, provision of 
suitable default values for unrealistically high ability estimates and the 
use of subjectively determined guessing constants might still allow generation 
of pools of calibrated items and the implementation of sequential item 
testing under consistent, if not ideal conditions. 

5. Resolution of a Spatial Visualization trait distribution into normal components. 

An empirical problem with data meeting the above ideal criteria involved 
making an inference about the mode of inheritance of an educationally important 
mental trait, spatial visualizing ability, by contrasting the properties of 
the separate ability distributions for the sexes. A 29-item audio-visual 
version of the Guilford-Ziramerman (1953) Spatial test was administered to 
a sample of 727 eleventh-grade students. The Normal Ogive latent ability 
estimates were obtained under the conventional model and the forms of the 
distributions were analysed for the sexes separately . Table *4 (a & b) shows 
the results after removing extreme cases. Our first-hand knowledge of the data 
plus the fact of convergence of the parameter estimates under the conventional 
model, lead us to place considerable faith in the validity of this analysis. 

A maximum likelihood decomposition of these distributions into normal 
components by the method of Day (1969) ytUlc - .e results in Table 5 (a & b), 
namely an upper component comprising 51% of the variation in boys spatial 
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ability which corresponds to a similar component comprising only 20% of 
the variance tor girls. Given the range of ability estimates from -2.0 to 
+3.0, the means of .80 and .68 of these components, respectively, are virtually 
equal. To deal objectively with the significance of the findings, a likelihood 
ratio test on 2 degrees of freedom was calculated to test the fit of only 
one component. Also, a Pearsonian x on 16 degrees of freedom was used to 
check the adequacy of a bimodal model. These indices (Table 5) verified 
that the deviation from normality shown in Table 4 is <*je to the presence of 
two and only two underlying components. This structure is illustrated in 
Figure 3 against the background of the frequency histograms for the data. 

The existence of a sex-differentiating duality in the distribution 
of a continuous human variable Is compelling evidence for a sex— inf luenced 
major gene. The above proportions immediately suggest an X-llnked recessive 
allele with frequency close to 0,5. Moreover* the (assumed) common variance 
of the components for girls is estimated at nearly one half the magnitude 
of that for boys (Table 5), suggesting an averaging effect in females which 
does not occur in males. This is entirely consistent with the hypothesis of 
sex- linkage and decidedly reinforces the correlational evidence for this model. 
(Stafford, 1961; Hartlage, 1970; Kolakowski, 1970) 

6. Discussion 

While the importance of a latent-trait measurement model for validly 
investigating the mode of inheritance of an intellectual ability is apparent, 
it is equally clear that we need to be able to objectively select one of several 
conflicting models without resort to considerations external to the estimation 
problem. Internal corrections for bias and/or the simultaneous estimation of 
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guessing parameters for the Normal Ogive model are two as yet unrealized 
approaches which would correct the weaknesses in the present investigation* 

On the other hand, other psychometric models based upon the logistic distri- 
bution may be more promising in this regard, (see Blmbaum, 1968; Rasch, 
1960) . 

For Instances in which chance responses can be eliminated on external 
grounds, the assumption of normality of the components is still open to 
scrutiny. Lord (1960) has shown that errors of measurement cannot be assumed 
to be normally distributed if a subject’s score is taken to be the number 
of items answered correctly. Latent traits being continuous and unbounded, 
however, this assumption is at least plausible. It therefore remains to 
investigate the bias of the foregoing procedures for a variety of true score 
distributions or better yet, to specify theoretically the conditions under 
which unbiased estimates can be expected to obtain. 
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FIGURE 1: Distribution of recovered estimates for the conventional 
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FIGURE 3: Distributions of recovered esf- ima tes fro® synthetic sriessin^ d-Tta 
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FIGURE 4 : SPATIAL VISUALIZATION ABILITY SCORE DISTRIBUTIONS 



